Systems and techniques for distributed and stream data mining
نویسندگان
چکیده
Nowadays huge amounts of electronic data are naturally collected in distributed sites, due to either plural ownership or geographical distribution of the processes that produce data. Moving data to a location for extracting useful and actionable knowledge is usually considered unfeasible, for either policy or technical reasons. It thus becomes mandatory to mine them by exploiting the multiple distributed resources close to data repositories. This requires to develop novel distributed data mining (DM) algorithms and systems, able to return a global knowledge by aggregating multiple local results. Besides the distributed nature of data under analysis, nowadays researchers have also to consider the streaming nature of them, which entails approximate on-line methods to mine data stream. This report introduces requirements, design issues, and typical solutions of distributed DM algorithms. We survey some of the most important works that recently appeared in the literature on this topic, included the latest proposals regarding P2P, highly distributed, approximate algorithms. Moreover, we discuss some of the most important proposals concerning stream DM algorithms. Such DM algorithms can be considered as the building blocks for realizing seamless distributed knowledge extraction processes and systems. Such systems can take advantage of the recent advances in computational and data Grid, which many researchers consider as the enabling technology for also developing highperformance knowledge discovery processes and systems. In this report we analyze some significative examples of distributed and Grid-oriented knowledge discovery systems.
منابع مشابه
Identification of Ti- anomaly in stream sediment geochemistry using of stepwise factor analysis and multifractal model in Delijan district, Iran
In this study, 115 samples taken from the stream sediments were analyzed for concentrations of As, Co, Cr, Cu, Ni, Pb, W, Zn, Au, Ba, Fe, Mn, Sr, Ti, U, V and Zr. In order to outline mineralization-derived stream sediments, various mapping techniques including fuzzy factor score, geochemical halos and fractal model were used. Based on these models, concentrations of Co, Cr, Ni, Zn, Ba, Fe, Mn, ...
متن کاملApplication of continuous restricted Boltzmann machine to detect multivariate anomalies from stream sediment geochemical data, Korit, East of Iran
Anomaly separation using stream sediment geochemical data has an essential role in regional exploration. Many different techniques have been proposed to distinguish anomalous from study area. In this research, a continuous restricted Boltzmann machine (CRBM), which is a generative stochastic artificial neural network, was used to recognize the mineral potential area in Korit 1:100000 sheet, loc...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملAnomaly delineation of porphyry copper deposits of Hanza Region through geochemical data analyses and multispectral remote sensing
Hanza region is located in the southern part of Urumieh–Dokhtar Metallogenic belt in southeastern Iran. This region includes six known porphyry copper deposits and it is considered as an ore- bearing region from geochemical point of view. The aim of this research is to examine effective processing techniques in the analysis of stream sediment geochemical datasets and ASTER satellite images. The...
متن کاملAlert correlation and prediction using data mining and HMM
Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006